data scientist
Pipeline Combinators for Gradual AutoML
Automated machine learning (AutoML) can make data scientists more productive. But if machine learning is totally automated, that leaves no room for data scientists to apply their intuition. Hence, data scientists often prefer not total but gradual automation, where they control certain choices and AutoML explores the rest. Unfortunately, gradual AutoML is cumbersome with state-of-the-art tools, requiring large non-compositional code changes. More concise compositional code can be achieved with combinators, a powerful concept from functional programming. This paper introduces a small set of orthogonal combinators for composing machine-learning operators into pipelines. It describes a translation scheme from pipelines and associated hyperparameter schemas to search spaces for AutoML optimizers. On that foundation, this paper presents Lale, an open-source sklearn-compatible AutoML library, and evaluates it with a user study.
PCS Workflow for Veridical Data Science in the Age of AI
Rewolinski, Zachary T., Yu, Bin
Data science is a pillar of artificial intelligence (AI), which is transforming nearly every domain of human activity, from the social and physical sciences to engineering and medicine. While data-driven findings in AI offer unprecedented power to extract insights and guide decision-making, many are difficult or impossible to replicate. A key reason for this challenge is the uncertainty introduced by the many choices made throughout the data science life cycle (DSLC). Traditional statistical frameworks often fail to account for this uncertainty. The Predictability-Computability-Stability (PCS) framework for veridical (truthful) data science offers a principled approach to addressing this challenge throughout the DSLC. This paper presents an updated and streamlined PCS workflow, tailored for practitioners and enhanced with guided use of generative AI. We include a running example to display the PCS framework in action, and conduct a related case study which showcases the uncertainty in downstream predictions caused by judgment calls in the data cleaning stage.
- South America > Uruguay > Maldonado > Maldonado (0.04)
- North America > United States > California (0.04)
- Europe > France (0.04)
- Asia > Japan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Research Report > Strength High (0.68)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables
Guo, Mengtian, Gotz, David, Wang, Yue
Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > North Carolina (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.69)
- Education > Educational Setting (0.68)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Materials > Chemicals > Industrial Gases > Liquified Gas (1.00)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (1.00)
- Energy > Oil & Gas > Midstream (1.00)
Training the next generation of physicians for artificial intelligence-assisted clinical neuroradiology: ASNR MICCAI Brain Tumor Segmentation (BraTS) 2025 Lighthouse Challenge education platform
Amiruddin, Raisa, Yordanov, Nikolay Y., Maleki, Nazanin, Fehringer, Pascal, Gkampenis, Athanasios, Janas, Anastasia, Krantchev, Kiril, Moawad, Ahmed, Umeh, Fabian, Abosabie, Salma, Abosabie, Sara, Alotaibi, Albara, Ghonim, Mohamed, Ghonim, Mohanad, Mhana, Sedra Abou Ali, Page, Nathan, Jakovljevic, Marko, Sharifi, Yasaman, Bhatia, Prisha, Manteghinejad, Amirreza, Guelen, Melisa, Veronesi, Michael, Hill, Virginia, So, Tiffany, Krycia, Mark, Petrovic, Bojan, Memon, Fatima, Cramer, Justin, Schrickel, Elizabeth, Kosovic, Vilma, Vidal, Lorenna, Thompson, Gerard, Ikuta, Ichiro, Albalooshy, Basimah, Nabavizadeh, Ali, Tahon, Nourel Hoda, Shekdar, Karuna, Bhatia, Aashim, Kirsch, Claudia, D'Anna, Gennaro, Lohmann, Philipp, Nour, Amal Saleh, Myronenko, Andriy, Goldman-Yassen, Adam, Reid, Janet R., Aneja, Sanjay, Bakas, Spyridon, Aboian, Mariam
High-quality reference standard image data creation by neuroradiology experts for automated clinical tools can be a powerful tool for neuroradiology & artificial intelligence education. We developed a multimodal educational approach for students and trainees during the MICCAI Brain Tumor Segmentation Lighthouse Challenge 2025, a landmark initiative to develop accurate brain tumor segmentation algorithms. Fifty-six medical students & radiology trainees volunteered to annotate brain tumor MR images for the BraTS challenges of 2023 & 2024, guided by faculty-led didactics on neuropathology MRI. Among the 56 annotators, 14 select volunteers were then paired with neuroradiology faculty for guided one-on-one annotation sessions for BraTS 2025. Lectures on neuroanatomy, pathology & AI, journal clubs & data scientist-led workshops were organized online. Annotators & audience members completed surveys on their perceived knowledge before & after annotations & lectures respectively. Fourteen coordinators, each paired with a neuroradiologist, completed the data annotation process, averaging 1322.9+/-760.7 hours per dataset per pair and 1200 segmentations in total. On a scale of 1-10, annotation coordinators reported significant increase in familiarity with image segmentation software pre- and post-annotation, moving from initial average of 6+/-2.9 to final average of 8.9+/-1.1, and significant increase in familiarity with brain tumor features pre- and post-annotation, moving from initial average of 6.2+/-2.4 to final average of 8.1+/-1.2. We demonstrate an innovative offering for providing neuroradiology & AI education through an image segmentation challenge to enhance understanding of algorithm development, reinforce the concept of data reference standard, and diversify opportunities for AI-driven image analysis among future physicians.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > South Carolina > Charleston County > Charleston (0.14)
- North America > United States > Missouri > Boone County > Columbia (0.14)
- (29 more...)
- Instructional Material > Course Syllabus & Notes (1.00)
- Research Report (0.83)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Getting In Contract with Large Language Models -- An Agency Theory Perspective On Large Language Model Alignment
Kaltenpoth, Sascha, Müller, Oliver
Adopting Large language models (LLMs) in organizations potentially revolutionizes our lives and work. However, they can generate off-topic, discriminating, or harmful content. This AI alignment problem often stems from misspecifications during the LLM adoption, unnoticed by the principal due to the LLM's black-box nature. While various research disciplines investigated AI alignment, they neither address the information asymmetries between organizational adopters and black-box LLM agents nor consider organizational AI adoption processes. Therefore, we propose LLM ATLAS (LLM Agency Theory-Led Alignment Strategy) a conceptual framework grounded in agency (contract) theory, to mitigate alignment problems during organizational LLM adoption. We conduct a conceptual literature analysis using the organizational LLM adoption phases and the agency theory as concepts. Our approach results in (1) providing an extended literature analysis process specific to AI alignment methods during organizational LLM adoption and (2) providing a first LLM alignment problem-solution space.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
- Overview (1.00)
- Research Report (0.64)